Search CORE

1,662 research outputs found

Using natural language processing to improve biomedical concept normalization and relation mining

Author: Kang N. (Ning)
Publication venue: Kang, N. (Ning)
Publication date: 18/09/2013
Field of study

This thesis concerns the use of natural language processing for improving biomedical concept normalization and relation mining. We begin with introducing the background of biomedical text mining, and subsequently we will continue by describing a typical text mining pipeline, some key issues and problems in mining biomedical texts, and the possibility of using natural language procesing to solve the problems. Finally we end an outline of the work done in this thesis

EUR Research Repository

Erasmus University Digital Repository

Training text chunkers on a silver standard corpus: Can silver replace gold?

Author: Kang N. (Ning)
Kors J.A. (Jan)
Mulligen E.M. (Erik) van
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background: To train chunkers in recognizing noun phrases and verb phrases in biomedical text, an annotated corpus is required. The creation of gold standard corpora (GSCs), however, is expensive and time-consuming. GSCs therefore tend to be small and to focus on specific subdomains, which limits their usefulness. We investigated the use of a silver standard corpus (SSC) that is automatically generated by combining the outputs of multiple chunking systems. We explored two use scenarios: one in which chunkers are trained on an SSC in a new domain for which a GSC is not available, and one in which chunkers are trained on an available, although small GSC but supplemented with an SSC.Results: We have tested the two scenarios using three chunkers, Lingpipe, OpenNLP, and Yamcha, and two different corpora, GENIA and PennBioIE. For the first scenario, we showed that the systems trained for noun-phrase recognition on the SSC in one domain performed 2.7-3.1 percenta

Springer - Publisher Connector

PubMed Central

EUR Research Repository

Erasmus University Digital Repository

Recommended from our members

Prototyping to elicit user requirements for product development: Using head-mounted augmented reality when designing interactive devices

Author: Crilly N
Kang B
Kristensson PO
Ning W
Publication venue: 'Elsevier BV'
Publication date: 01/12/2022
Field of study

Data availability: I have shared the data in the paper.Copyright © 2022 The Author(s). Designers of interactive devices are challenged by the need to accurately elicit user requirements from low-cost prototypes at the early stages of the design process. Head-mounted augmented reality (AR) can potentially assist in this process by economically representing physical-digital blended features with relatively high-fidelity prototypes. To explore this potential, we present and evaluate a head-mounted AR-enhanced hybrid prototyping system created in the context of a fan product development process. We conducted a mixed-methods study comparing the AR-enhanced prototyping method with a conventional prototyping method. The results reveal that the AR system can elicit similar user requirements as the conventional prototyping method with an improved overall experience

Brunel University Research Archive

ContextD: An algorithm to identify contextual properties of medical terms in a dutch clinical corpus

Author: Afzal M.Z. (Zubair)
Kang N. (Ning)
Kors J.A. (Jan)
Pons E. (Ewoud)
Schuemie M.J. (Martijn)
Sturkenboom M.C.J.M. (Miriam)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/11/2014
Field of study

Background: In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. Results: The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. Conclusions: The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development

Erasmus University Digital Repository

Knowledge-based extraction of adverse drug events from biomedical text

Author: Afzal M.Z. (Zubair)
Bui C. (Chinh)
Kang N. (Ning)
Kors J.A. (Jan)
Mulligen E.M. (Erik) van
Singh B. (Bharat)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Many biomedical relation extraction systems are machine-learning based and have to be trained on large annotated corpora that are expensive and cumbersome to construct. We developed a knowledge-based relation extraction system that requires minimal training data, and applied the system for the extraction of adverse drug events from biomedical text. The system consists of a concept recognition module that identifies drugs and adverse effects in sentences, and a knowledg

EUR Research Repository

Erasmus University Digital Repository

Using rule-based natural language processing to improve disease normalization in biomedical text

Author: Afzal M.Z. (Zubair)
Kang N. (Ning)
Kors J.A. (Jan)
Mulligen E.M. (Erik) van
Singh B. (Bharat)
Publication venue: 'BMJ'
Publication date: 01/01/2013
Field of study

Background and objective: In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical field are dictionarybased. In this study we investigate the usefulness of natural language processing (NLP) as an adjunct to dictionary-based concept normalization. Methods: We compared the performance of two biomedical concept normalization systems, MetaMap and Peregrine, on the Arizona Disease Corpus, with and without the use of a rule-based NLP module. Performance was assessed for exact and inexact boundary matching of the system annotations with those of the gold standard and for concept identifier matching. Results: Without the NLP module, MetaMap and Peregrine attained F-scores of 61.0% and 63.9%, respectively, for exact boundary matching, and 55.1% and 56.9% for concept identifier matching. With the aid of the NLP module, the F-scores of MetaMap and Peregrine improved to 73.3% and 78.0% for boundary matching, and to 66.2% and 69.8% for concept identifier matching. For inexact boundary matching, performances further increased to 85.5% and 85.4%, and to 73.6% and 73.3% for concept identifier matching. Conclusions: We have shown the added value of NLP for the recognition and normalization of diseases with MetaMap and Peregrine. The NLP module is general and can be applied in combination with any concept normalization system. Whether its use for concept types other than disease is equally advantageous remains to be investigated

Crossref

EUR Research Repository

Erasmus University Digital Repository

Probing Shadowed Nuclear Sea with Massive Gauge Bosons in the Future Heavy-Ion Collisions

Author: AD Martin
Ben-Wei Zhang
D Florian de
Enke Wang
H Paukkunen
H Xing
I Schienbein
JL Albacete
K Kovarik
KJ Eskola
N Armesto
Peng Ru
R Vogt
RB Neufeld
RB Neufeld
RD Field
RK Ellis
S Catani
S Catani
SD Drell
SD Drell
V Kartvelishvili
W Dai
W Dai
Wei-Ning Zhang
Y He
Z Conesa del Valle
ZB Kang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/09/2015
Field of study

The production of the massive bosons

Z^0

and

W^{\pm}

could provide an excellent tool to study cold nuclear matter effects and the modifications of nuclear parton distribution functions (nPDFs) relative to parton distribution functions (PDFs) of a free proton in high energy nuclear reactions at the LHC as well as in heavy-ion collisions (HIC) with much higher center-of mass energies available in the future colliders. In this paper we calculate the rapidity and transverse momentum distributions of the vector boson and their nuclear modification factors in p+Pb collisions at

\sqrt{s_{NN}}=63

TeV and in Pb+Pb collisions at

\sqrt{s_{NN}}=39

TeV in the framework of perturbative QCD by utilizing three parametrization sets of nPDFs: EPS09, DSSZ and nCTEQ. It is found that in heavy-ion collisions at such high colliding energies, both the rapidity distribution and the transverse momentum spectrum of vector bosons are considerably suppressed in wide kinematic regions with respect to p+p reactions due to large nuclear shadowing effect. We demonstrate that in the massive vector boson productions processes with sea quarks in the initial-state may give more contributions than those with valence quarks in the initial-state, therefore in future heavy-ion collisions the isospin effect is less pronounced and the charge asymmetry of W boson will be reduced significantly as compared to that at the LHC. Large difference between results with nCTEQ and results with EPS09 and DSSZ is observed in nuclear modifications of both rapidity and

p_T

distributions of

Z^0

and

W

in the future HIC.Comment: 13 pages, 21 figures, version accepted for publication in Eur. Phys. J.

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Springer - Publisher Connector

The CALBC Silver Standard Corpus for Biomedical Named Entities - A Study in Harmonizing the Contributions from Four Independent Named Entity Taggers

Author: Beisswanger E. (Elena)
Buyko E. (Ekaterina)
Corbett P. (Peter)
Hahn U. (Udo)
Jimeno-Yepes A.J. (Antonio José)
Kang N. (Ning)
Kors J.A. (Jan)
Milward D. (David)
Mulligen E.M. (Erik) van
Rebholz-Schuhmann D. (Dietrich)
Tomanek K. (Katrin)
Publication venue
Publication date: 01/01/2010
Field of study

The production of gold standard corpora is time-consuming and costly. We propose an alternative: the 'silver standard corpus' (SSC), a corpus that has been generated by the harmonisation of the annotations that have been delivered from a selection of annotation systems. The systems have to share the type system for the annotations and the harmonisation solution has use a suitable similarity measure for the pair-wise comparison of the annotations. The annotation systems have been evaluated against the harmonised set (630.324 sentences, 15, 956, 841 tokens). We can demonstrate that the annotation of proteins and genes shows higher diversity across all used annotation solutions leading to a lower agreement against the harmonised set in comparison to the annotations of diseases and species. An analysis of the most frequent annotations from all systems shows that a high agreement amongst systems leads to the selection of terms that are suitable to be kept in the harmonised set. This is the first large-scale approach to generate an annotated corpus from automated annotation systems. Further research is required to understand, how the annotations from different systems have to be combined to produce the best annotation result for a harmonised corpus

Erasmus University Digital Repository

MCL-CAw: A refinement of MCL for detecting yeast complexes from weighted PPI networks by incorporating core-attachment structure

Author: A Eberharter
A Mitrofanova
AC Gavin
AC Gavin
AD King
AJ Enright
B Breitkreutz
B Zhang
C Fridel
C Friedel
C von Mering
DF Seals
E Zotenko
EA Winzeler
G Giaever
G Hart
G Liu
G Liu
G Rigaut
GD Bader
H Cheng
H Chua
H Jeong
H Leung
H Wang
Hon Wai Leong
HW Mewes
J Hurwitz
J Zhao
JC Mellor
JD Han
JM Cherry
JS Luz
K Voevodski
Kang Ning
M Ashburner
M Wu
N Batada
NJ Krogan
P Aloy
P Carvalho
P Shannon
P Uetz
PA Grant
PA Grant
S Brohee
S Dongen
S Pu
S Pu
S Srihari
S Srihari
SR Collins
Sriganesh Srihari
T Ito
T Miller
X Zhou
Y Araki
Y Ho
Y Ozawa
Y Tao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Abstract Background The reconstruction of protein complexes from the physical interactome of organisms serves as a building block towards understanding the higher level organization of the cell. Over the past few years, several independent high-throughput experiments have helped to catalogue enormous amount of physical protein interaction data from organisms such as yeast. However, these individual datasets show lack of correlation with each other and also contain substantial number of false positives (noise). Over these years, several affinity scoring schemes have also been devised to improve the qualities of these datasets. Therefore, the challenge now is to detect meaningful as well as novel complexes from protein interaction (PPI) networks derived by combining datasets from multiple sources and by making use of these affinity scoring schemes. In the attempt towards tackling this challenge, the Markov Clustering algorithm (MCL) has proved to be a popular and reasonably successful method, mainly due to its scalability, robustness, and ability to work on scored (weighted) networks. However, MCL produces many noisy clusters, which either do not match known complexes or have additional proteins that reduce the accuracies of correctly predicted complexes. Results Inspired by recent experimental observations by Gavin and colleagues on the modularity structure in yeast complexes and the distinctive properties of "core" and "attachment" proteins, we develop a core-attachment based refinement method coupled to MCL for reconstruction of yeast complexes from scored (weighted) PPI networks. We combine physical interactions from two recent "pull-down" experiments to generate an unscored PPI network. We then score this network using available affinity scoring schemes to generate multiple scored PPI networks. The evaluation of our method (called MCL-CAw) on these networks shows that: (i) MCL-CAw derives larger number of yeast complexes and with better accuracies than MCL, particularly in the presence of natural noise; (ii) Affinity scoring can effectively reduce the impact of noise on MCL-CAw and thereby improve the quality (precision and recall) of its predicted complexes; (iii) MCL-CAw responds well to most available scoring schemes. We discuss several instances where MCL-CAw was successful in deriving meaningful complexes, and where it missed a few proteins or whole complexes due to affinity scoring of the networks. We compare MCL-CAw with several recent complex detection algorithms on unscored and scored networks, and assess the relative performance of the algorithms on these networks. Further, we study the impact of augmenting physical datasets with computationally inferred interactions for complex detection. Finally, we analyse the essentiality of proteins within predicted complexes to understand a possible correlation between protein essentiality and their ability to form complexes. Conclusions We demonstrate that core-attachment based refinement in MCL-CAw improves the predictions of MCL on yeast PPI networks. We show that affinity scoring improves the performance of MCL-CAw.http://deepblue.lib.umich.edu/bitstream/2027.42/78256/1/1471-2105-11-504.xmlhttp://deepblue.lib.umich.edu/bitstream/2027.42/78256/2/1471-2105-11-504-S1.PDFhttp://deepblue.lib.umich.edu/bitstream/2027.42/78256/3/1471-2105-11-504-S2.ZIPhttp://deepblue.lib.umich.edu/bitstream/2027.42/78256/4/1471-2105-11-504.pdfPeer Reviewe

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Deep Blue Documents at the University of Michigan

ScholarBank@NUS

Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences

Constraints on Spin-Independent Nucleus Scattering with sub-GeV Weakly Interacting Massive Particle Dark Matter from the CDEX-1B Experiment at the China Jin-Ping Laboratory

Author: Agartioglu M.
An H. P.
Chang J. P.
Chen J. H.
Chen Y. H.
Cheng J. P.
Deng Z.
Du Q.
Gong H.
Guo Q. J.
Guo X. Y.
He L.
He S. M.
Hu J. W.
Hu Q. D.
Huang H. X.
Jia L. P.
Jiang H.
Kang K. J.
Li H.
Li H. B.
Li J.
Li J. M.
Li X.
Li X. Q.
Li Y. J.
Li Y. L.
Liao B.
Lin F. K.
Lin S. T.
Liu S. K.
Liu Y. D.
Liu Y. Y.
Liu Z. Z.
Ma H.
Ma J. L.
Mao Y. C.
Ning J. H.
Pan H.
Qi N. C.
Ren J.
Ruan X. C.
Sharma V
She Z.
Singh L.
Singh M. K.
Sun T. X.
Tang C. J.
Tang W. Y.
Tian Y.
Wang G. F.
Wang L.
Wang Q.
Wang Y.
Wang Y. X.
Wong H. T.
Wu S. Y.
Wu Y. C.
Xing H. Y.
Xu Y.
Xue T.
Yang L. T.
Yi N.
Yu C. X.
Yu H. J.
Yue J. F.
Yue Q.
Zeng M.
Zeng Z.
Zhang F. S.
Zhao M. G.
Zhou J. F.
Zhou Z. Y.
Zhu J. J.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/10/2019
Field of study

We report results on the searches of weakly interacting massive particles (WIMPs) with sub-GeV masses (

m_{\chi}

) via WIMP-nucleus spin-independent scattering with Migdal effect incorporated. Analysis on time-integrated (TI) and annual modulation (AM) effects on CDEX-1B data are performed, with 737.1 kg

\cdot

day exposure and 160 eVee threshold for TI analysis, and 1107.5 kg

\cdot

day exposure and 250 eVee threshold for AM analysis. The sensitive windows in

m_{\chi}

are expanded by an order of magnitude to lower DM masses with Migdal effect incorporated. New limits on

\sigma_{\chi N}^{\rm SI}

at 90\% confidence level are derived as

2\times

^{-32}\sim7\times

^{-35}

\rm cm^2

for TI analysis at

m_{\chi}\sim

-

180 MeV/

c^2

, and

3\times

^{-32}\sim9\times

^{-38}

\rm cm^2

for AM analysis at

m_{\chi}\sim

75 MeV/

c^2-

3.0 GeV/

c^2

.Comment: 5 pages, 4 figure

arXiv.org e-Print Archive

Dokuz Eylul University Research Information System